The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we focus on the basic form of autonomous follow driving problem with one leader and one follower. A reinforcement learning based throttle and brake control approach is developed for the follower vehicle. Near optimal control law is directly learned by “trial and error” with the neural dynamic programming algorithm. According to the timely updated following state, the learned control...
This paper focuses on presenting a human-in-the-loop reinforcement learning theory framework and foreseeing its application to driving decision making. Currently, the technologies in human-vehicle collaborative driving face great challenges, and do not consider the Human-in-the-loop learning framework and Driving Decision-Maker optimization under the complex road conditions. The main content of this...
Designing optimal controllers continues to be challenging as systems are becoming complex and are inherently nonlinear. The principal advantage of reinforcement learning (RL) is its ability to learn from the interaction with the environment and provide an optimal control strategy. In this paper, RL is explored in the context of control of the benchmark cart-pole dynamical system with no prior knowledge...
Power optimization based on intelligent algorithm draws more and more attention. This article presents a novel low power optimization strategy based on the high level software power management employing Markov Process for charactering the real running workload. This article formulates workload characterization and selection with stochastic process method, and solves the formula using dynamic voltage...
A cooperative multi-agent system entitles some independent agents to complete complex tasks through coordination and cooperation. Since the dynamics of physical agents are so complex that the environment of learning is indeed stochastic, the paper introduces the decentralized multi-agent reinforcement learning (MARL) algorithm, named as Decentralized Concurrent Learning with Cooperative Policy Exploration...
Urban cities are getting more congested with vehicular traffic and most of the traffic control systems are not smart to detect and give priority to emergency vehicles. The effect results to inadequate services delivered by the public emergency agencies, and unnecessary traffic congestion to other road users at intersection points. In this paper we present an effective reinforced road traffic control...
Parallel applications are highly irregular and high performance computing (HPC) infrastructures are very complex. The HPC applications of interest herein are timestepping scientific applications (TSSA). Often, TSSA involve the repeated execution of multiple parallel loops with thousands of iterations and irregular behavior. Dynamic loop scheduling (DLS) techniques were developed over time and have...
Bayesian reinforcement learning provides an elegant solution to the optimal tradeoff between exploration and exploitation of the uncertainty in learning. Unfortunately, the size of the learning parameters grows exponentially with the problem horizon. In this paper, we propose a novel Monte Carlo tree search for Bayesian reinforcement learning approach using a compact factored representation, to solve...
In recent years, Dynamic Adaptive Streaming over HTTP (DASH) has gained momentum as an effective solution for delivering videos on the Internet. This trend is further driven by the deployment of existing HTTP cache infrastructures in DASH systems to reduce the traffic load as well as to serve clients better. However, deploying conventional cache servers in DASH systems still suffers from low cache...
We propose a reinforcement learning algorithm, Megh, for live migration of virtual machines that simultaneously reduces the cost of energy consumption and enhances the performance. Megh learns the uncertain dynamics of workloads as-it-goes. Megh uses a dimensionality reduction scheme to projectthe combinatorially explosive state-action space to a polynomial dimensional space. These schemes enable...
In an electronic warfare-type scenario, an optimal jamming strategy is vital important for a jammer who has restricted power and how to make the optimal strategies quickly and accurately put on the agenda. In this paper, we developed a cognitive jammer who could learn the optimal jamming strategies with the proposed algorithm-Greedy Bandits (GB). By interacting with transmitter-receiver pairs continually,...
Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy...
We introduce an information theoretic model predictive control (MPC) algorithm capable of handling complex cost criteria and general nonlinear dynamics. The generality of the approach makes it possible to use multi-layer neural networks as dynamics models, which we incorporate into our MPC algorithm in order to solve model-based reinforcement learning tasks. We test the algorithm in simulation on...
A reinforcement learning (RL) agent needs a fair amount of experience to find a near-optimal policy. Transfer learning has been investigated as a means to reduce the amount of experience required. Transfer learning, however, requires another similar reinforcement learning task as a transfer source, which can also be costly in the amount of experience required. In this research, we examine the possible...
Policy gradient algorithms are useful reinforcement learning methods which optimize a control policy by performing stochastic gradient descent with respect to controller parameters. In this paper, we extend actor-critic algorithms by adding an ℓ1 norm regularization on the actor part, which makes our algorithm automatically select and optimize the useful controller basis functions. Our method is closely...
Reliability, interoperability and efficiency are fundamental in Wireless Sensor Network deployment. Herein we look at how transmission power control may be used to reduce interference, which is particularly problematic in high-density conditions. We adopt a distributed approach where every node has the ability to learn which transmission power is most appropriate, given the network conditions and...
Modern large-scale computing deployments consist of complex applications running over machine clusters. An important issue there is the offering of elasticity, i.e., the dynamic allocation of resources to applications to meet fluctuating workload demands. Threshold based approaches are typically employed, yet they are difficult to configure and optimize. Approaches based on reinforcement learning...
Routing in dynamically changing node location scenarios is quite challenging and time consuming. The emerging wireless communication networks such as LTE advanced and 5G, device-to-device communications present such dynamically changing node locations. In mobile ad hoc networks, very often we come across such dynamically changing node location scenarios. In the Internet of things (IoTs), we will come...
A routing algorithm based on Q-routing paradigm is proposed for ad-hoc dynamically changing networks. The technique derived from Full Echo approach is used to enhance exploration capacity and prevent instability of routing under high load conditions. The performance of routing is increased by random polling of neighbors according to the local estimates of the average delivery time in the network.
There are several groups of routing algorithms in dynamically changing networks developing every year. We introduce an additional parameter “Battery Life decrease” in the existing Q-Routing protocol. The Battery Life is reduced in direct proportion to the number of packets transmitted to the node. The efficiency of the Optimized Battery Life Q-Routing protocol is estimated by total loss of network...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.